This (preferred) form of the ValueAttr option requires you to specify both
the element and the attribute names. This is not only safer, it also allows
the original XML to be reconstructed by C<XMLout()>.
Note: You probably don't want to use this option and the NoAttr option at the
same time.
=head2 Variables => { name => value } I<# in - handy>
This option allows variables in the XML to be expanded when the file is read.
(there is no facility for putting the variable names back if you regenerate
XML using C<XMLout>).
A 'variable' is any text of the form C<${name}> which occurs in an attribute
value or in the text content of an element. If 'name' matches a key in the
supplied hashref, C<${name}> will be replaced with the corresponding value from
the hashref. If no matching key is found, the variable will not be replaced.
Names must match the regex: C<[\w.]+> (ie: only 'word' characters and dots are
allowed).
=head2 VarAttr => 'attr_name' I<# in - handy>
In addition to the variables defined using C<Variables>, this option allows
variables to be defined in the XML. A variable definition consists of an
element with an attribute called 'attr_name' (the value of the C<VarAttr>
option). The value of the attribute will be used as the variable name and the
text content of the element will be used as the value. A variable defined in
this way will override a variable defined using the C<Variables> option. For
example:
XMLin( '<opt>
<dir name="prefix">/usr/local/apache</dir>
<dir name="exec_prefix">${prefix}</dir>
<dir name="bindir">${exec_prefix}/bin</dir>
</opt>',
VarAttr => 'name', ContentKey => '-content'
);
produces the following data structure:
{
dir => {
prefix => '/usr/local/apache',
exec_prefix => '/usr/local/apache',
bindir => '/usr/local/apache/bin',
}
}
=head2 XMLDecl => 1 or XMLDecl => 'string' I<# out - handy>
If you want the output from C<XMLout()> to start with the optional XML
declaration, simply set the option to '1'. The default XML declaration is:
<?xml version='1.0' standalone='yes'?>
If you want some other string (for example to declare an encoding value), set
the value of this option to the complete string you require.
=head1 OPTIONAL OO INTERFACE
The procedural interface is both simple and convenient however there are a
couple of reasons why you might prefer to use the object oriented (OO)
interface:
=over 4
=item *
to define a set of default values which should be used on all subsequent calls
to C<XMLin()> or C<XMLout()>
=item *
to override methods in B<XML::Simple> to provide customised behaviour
=back
The default values for the options described above are unlikely to suit
everyone. The OO interface allows you to effectively override B<XML::Simple>'s
defaults with your preferred values. It works like this:
First create an XML::Simple parser object with your preferred defaults:
my $xs = XML::Simple->new(ForceArray => 1, KeepRoot => 1);
then call C<XMLin()> or C<XMLout()> as a method of that object:
my $ref = $xs->XMLin($xml);
my $xml = $xs->XMLout($ref);
You can also specify options when you make the method calls and these values
will be merged with the values specified when the object was created. Values
specified in a method call take precedence.
Note: when called as methods, the C<XMLin()> and C<XMLout()> routines may be
called as C<xml_in()> or C<xml_out()>. The method names are aliased so the
only difference is the aesthetics.
=head2 Parsing Methods
You can explicitly call one of the following methods rather than rely on the
C<xml_in()> method automatically determining whether the target to be parsed is
a string, a file or a filehandle:
=over 4
=item parse_string(text)
Works exactly like the C<xml_in()> method but assumes the first argument is
a string of XML (or a reference to a scalar containing a string of XML).
=item parse_file(filename)
Works exactly like the C<xml_in()> method but assumes the first argument is
the name of a file containing XML.
=item parse_fh(file_handle)
Works exactly like the C<xml_in()> method but assumes the first argument is
a filehandle which can be read to get XML.
=back
=head2 Hook Methods
You can make your own class which inherits from XML::Simple and overrides
certain behaviours. The following methods may provide useful 'hooks' upon
which to hang your modified behaviour. You may find other undocumented methods
by examining the source, but those may be subject to change in future releases.
=over 4
=item handle_options(direction, name => value ...)
This method will be called when one of the parsing methods or the C<XMLout()>
method is called. The initial argument will be a string (either 'in' or 'out')
and the remaining arguments will be name value pairs.
=item default_config_file()
Calculates and returns the name of the file which should be parsed if no
filename is passed to C<XMLin()> (default: C<$0.xml>).
=item build_simple_tree(filename, string)
Called from C<XMLin()> or any of the parsing methods. Takes either a file name
as the first argument or C<undef> followed by a 'string' as the second
argument. Returns a simple tree data structure. You could override this
method to apply your own transformations before the data structure is returned
to the caller.
=item new_hashref()
When the 'simple tree' data structure is being built, this method will be
called to create any required anonymous hashrefs.
=item sorted_keys(name, hashref)
Called when C<XMLout()> is translating a hashref to XML. This routine returns
a list of hash keys in the order that the corresponding attributes/elements
should appear in the output.
=item escape_value(string)
Called from C<XMLout()>, takes a string and returns a copy of the string with
XML character escaping rules applied.
=item numeric_escape(string)
Called from C<escape_value()>, to handle non-ASCII characters (depending on the
value of the NumericEscape option).
=item copy_hash(hashref, extra_key => value, ...)
Called from C<XMLout()>, when 'unfolding' a hash of hashes into an array of
hashes. You might wish to override this method if you're using tied hashes and
don't want them to get untied.
=back
=head2 Cache Methods
XML::Simple implements three caching schemes ('storable', 'memshare' and
'memcopy'). You can implement a custom caching scheme by implementing
two methods - one for reading from the cache and one for writing to it.
For example, you might implement a new 'dbm' scheme that stores cached data
structures using the L<MLDBM> module. First, you would add a
C<cache_read_dbm()> method which accepted a filename for use as a lookup key
and returned a data structure on success, or undef on failure. Then, you would
implement a C<cache_read_dbm()> method which accepted a data structure and a
filename.
You would use this caching scheme by specifying the option:
Cache => [ 'dbm' ]
=head1 STRICT MODE
If you import the B<XML::Simple> routines like this:
use XML::Simple qw(:strict);
the following common mistakes will be detected and treated as fatal errors
=over 4
=item *
Failing to explicitly set the C<KeyAttr> option - if you can't be bothered
reading about this option, turn it off with: KeyAttr => [ ]
=item *
Failing to explicitly set the C<ForceArray> option - if you can't be bothered
reading about this option, set it to the safest mode with: ForceArray => 1
=item *
Setting ForceArray to an array, but failing to list all the elements from the
KeyAttr hash.
=item *
Data error - KeyAttr is set to say { part => 'partnum' } but the XML contains
one or more E<lt>partE<gt> elements without a 'partnum' attribute (or nested
element). Note: if strict mode is not set but -w is, this condition triggers a
warning.
=item *
Data error - as above, but non-unique values are present in the key attribute
(eg: more than one E<lt>partE<gt> element with the same partnum). This will
also trigger a warning if strict mode is not enabled.
=item *
Data error - as above, but value of key attribute (eg: partnum) is not a
scalar string (due to nested elements etc). This will also trigger a warning
if strict mode is not enabled.
=back
=head1 SAX SUPPORT
From version 1.08_01, B<XML::Simple> includes support for SAX (the Simple API
for XML) - specifically SAX2.
In a typical SAX application, an XML parser (or SAX 'driver') module generates
SAX events (start of element, character data, end of element, etc) as it parses
an XML document and a 'handler' module processes the events to extract the
required data. This simple model allows for some interesting and powerful
possibilities:
=over 4
=item *
Applications written to the SAX API can extract data from huge XML documents
without the memory overheads of a DOM or tree API.
=item *
The SAX API allows for plug and play interchange of parser modules without
having to change your code to fit a new module's API. A number of SAX parsers
are available with capabilities ranging from extreme portability to blazing
performance.
=item *
A SAX 'filter' module can implement both a handler interface for receiving
data and a generator interface for passing modified data on to a downstream
handler. Filters can be chained together in 'pipelines'.
=item *
One filter module might split a data stream to direct data to two or more
downstream handlers.
=item *
Generating SAX events is not the exclusive preserve of XML parsing modules.
For example, a module might extract data from a relational database using DBI
and pass it on to a SAX pipeline for filtering and formatting.
=back
B<XML::Simple> can operate at either end of a SAX pipeline. For example,
you can take a data structure in the form of a hashref and pass it into a
SAX pipeline using the 'Handler' option on C<XMLout()>:
use XML::Simple;
use Some::SAX::Filter;
use XML::SAX::Writer;
my $ref = {
.... # your data here
};
my $writer = XML::SAX::Writer->new();
my $filter = Some::SAX::Filter->new(Handler => $writer);
my $simple = XML::Simple->new(Handler => $filter);
$simple->XMLout($ref);
You can also put B<XML::Simple> at the opposite end of the pipeline to take
advantage of the simple 'tree' data structure once the relevant data has been
isolated through filtering:
use XML::SAX;
use Some::SAX::Filter;
use XML::Simple;
my $simple = XML::Simple->new(ForceArray => 1, KeyAttr => ['partnum']);
my $filter = Some::SAX::Filter->new(Handler => $simple);
my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
my $ref = $parser->parse_uri('some_huge_file.xml');
print $ref->{part}->{'555-1234'};
You can build a filter by using an XML::Simple object as a handler and setting
its DataHandler option to point to a routine which takes the resulting tree,
modifies it and sends it off as SAX events to a downstream handler:
my $writer = XML::SAX::Writer->new();
my $filter = XML::Simple->new(
DataHandler => sub {
my $simple = shift;
my $data = shift;
# Modify $data here
$simple->XMLout($data, Handler => $writer);
}
);
my $parser = XML::SAX::ParserFactory->parser(Handler => $filter);
$parser->parse_uri($filename);
I<Note: In this last example, the 'Handler' option was specified in the call to
C<XMLout()> but it could also have been specified in the constructor>.
=head1 ENVIRONMENT
If you don't care which parser module B<XML::Simple> uses then skip this
section entirely (it looks more complicated than it really is).
B<XML::Simple> will default to using a B<SAX> parser if one is available or
B<XML::Parser> if SAX is not available.
You can dictate which parser module is used by setting either the environment
variable 'XML_SIMPLE_PREFERRED_PARSER' or the package variable
$XML::Simple::PREFERRED_PARSER to contain the module name. The following rules
are used:
=over 4
=item *
The package variable takes precedence over the environment variable if both are defined. To force B<XML::Simple> to ignore the environment settings and use
its default rules, you can set the package variable to an empty string.
=item *
If the 'preferred parser' is set to the string 'XML::Parser', then
L<XML::Parser> will be used (or C<XMLin()> will die if L<XML::Parser> is not
installed).
=item *
If the 'preferred parser' is set to some other value, then it is assumed to be
the name of a SAX parser module and is passed to L<XML::SAX::ParserFactory.>
If L<XML::SAX> is not installed, or the requested parser module is not
installed, then C<XMLin()> will die.
=item *
If the 'preferred parser' is not defined at all (the normal default
state), an attempt will be made to load L<XML::SAX>. If L<XML::SAX> is
installed, then a parser module will be selected according to
L<XML::SAX::ParserFactory>'s normal rules (which typically means the last SAX
parser installed).
=item *
if the 'preferred parser' is not defined and B<XML::SAX> is not
installed, then B<XML::Parser> will be used. C<XMLin()> will die if
L<XML::Parser> is not installed.
=back
Note: The B<XML::SAX> distribution includes an XML parser written entirely in
Perl. It is very portable but it is not very fast. You should consider
installing L<XML::LibXML> or L<XML::SAX::Expat> if they are available for your
platform.
=head1 ERROR HANDLING
The XML standard is very clear on the issue of non-compliant documents. An
error in parsing any single element (for example a missing end tag) must cause
the whole document to be rejected. B<XML::Simple> will die with an appropriate
message if it encounters a parsing error.
If dying is not appropriate for your application, you should arrange to call
C<XMLin()> in an eval block and look for errors in $@. eg:
my $config = eval { XMLin() };
PopUpMessage($@) if($@);
Note, there is a common misconception that use of B<eval> will significantly
slow down a script. While that may be true when the code being eval'd is in a
string, it is not true of code like the sample above.
=head1 EXAMPLES
When C<XMLin()> reads the following very simple piece of XML:
<opt username="testuser" password="frodo"></opt>
it returns the following data structure:
{
'username' => 'testuser',
'password' => 'frodo'
}
The identical result could have been produced with this alternative XML:
<opt username="testuser" password="frodo" />
Or this (although see 'ForceArray' option for variations):
<opt>
<username>testuser</username>
<password>frodo</password>
</opt>
Repeated nested elements are represented as anonymous arrays:
<opt>
<person firstname="Joe" lastname="Smith">
<email>joe@smith.com</email>
<email>jsmith@yahoo.com</email>
</person>
<person firstname="Bob" lastname="Smith">
<email>bob@smith.com</email>
</person>
</opt>
{
'person' => [
{
'email' => [
'joe@smith.com',
'jsmith@yahoo.com'
],
'firstname' => 'Joe',
'lastname' => 'Smith'
},
{
'email' => 'bob@smith.com',
'firstname' => 'Bob',
'lastname' => 'Smith'
}
]
}
Nested elements with a recognised key attribute are transformed (folded) from
an array into a hash keyed on the value of that attribute (see the C<KeyAttr>